Skip to content

feat: sliding-window frame parser with per-type resync tracking#47

Merged
marklynch merged 1 commit into
marklynch:mainfrom
lawther:feat/sliding_window
Jun 25, 2026
Merged

feat: sliding-window frame parser with per-type resync tracking#47
marklynch merged 1 commit into
marklynch:mainfrom
lawther:feat/sliding_window

Conversation

@lawther

@lawther lawther commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

This patch got a little larger than I initially planned. The guts (in framing.c) is reasonably straightforward. I did break it out into its own framing.c file so I could write all the tests against it. The tests are reasonably comprehensive - they test resync/recovery, and as a bonus it's now easier to test reassembly across multiple transport reads, so I added that too.

I upgraded the output in the 'unknown messages' UI to display all the resync types. It might be a bit noisy, but it allows eyeballing to see what the common causes of resync are.


Replace the all-or-nothing frame parser with a sliding-window resync: on a bad header checksum, control bytes, length, end byte, or data checksum the parser discards a single byte and retries, recovering valid frames from stutters, cross-device collisions, and line noise (issue #46).

Framing logic moves into a dedicated, host-testable framing.c module; error accounting is reworked from message-error counters into per-type resync counters surfaced on the status page and tagged through the unknown-message buffer.

Tests: golden-file harness seeded with the real captures from the issue, plus chunked-input cases (| splits one case into multiple add_bytes calls) and a chunk-invariance property sweep proving frame recovery is independent of how the byte stream is fragmented across reads.

Closes #46

Replace the all-or-nothing frame parser with a sliding-window resync: on a
bad header checksum, control bytes, length, end byte, or data checksum the
parser discards a single byte and retries, recovering valid frames from
stutters, cross-device collisions, and line noise (issue marklynch#46).

Framing logic moves into a dedicated, host-testable framing.c module; error
accounting is reworked from message-error counters into per-type resync
counters surfaced on the status page and tagged through the unknown-message
buffer.

Tests: golden-file harness seeded with the real captures from the issue,
plus chunked-input cases (| splits one case into multiple add_bytes calls)
and a chunk-invariance property sweep proving frame recovery is independent
of how the byte stream is fragmented across reads.

@marklynch marklynch left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work — the sliding-window resync is a clear improvement, the framing logic reads correctly (loop terminates, bounds are safe, capture/dedup behavior is sound), and pulling it into a host-testable framing.c with golden + chunk-invariance tests is great. A few notes:

1. unknown_msgs_json_handler holds the unknown-buffer mutex across the whole JSON build (medium)

unknown_buffer_lock_for_read() keeps the mutex held while serializing up to UNKNOWN_BUFFER_CAPACITY entries (cJSON allocations, snprintf, get_device_name). Meanwhile unknown_buffer_record() runs on the hot RX/resync path with a 100 ms timeout and drops the record on contention. Loading the Unknown Messages page while the bus is noisy can therefore both lose captures and add latency to the bridge task. Consider snapshotting the entries under the lock, releasing, then building the JSON from the copy.

2. resync_wrapper counter switch has no default (low)

The counter-incrementing switch (type) lacks a default, while the reason-mapping switch just below it has one. If a new tcp_bridge_resync_type_t is added later, resyncs_total still increments but no per-type counter does, so total would no longer equal the sum of the breakdown (and there's no compile-time nudge to catch it).

3. decode_message no longer validates header/length/data checksums (low)

Validation now lives solely in the framing layer, which is correct for RX since framing guarantees validity before dispatch. But decode_message is also called directly on the TX-echo path (tcp_bridge.c:330, user-typed hex) and from the test harness, so malformed input there is now decoded with no checksum warning where it previously was flagged. Low impact — mostly worth noting the new "assumes pre-validated input" invariant for decode_message.


Generated by Claude Code

@marklynch

Copy link
Copy Markdown
Owner

This is great work and should make it a way better. Thanks.

Claude review above, the key one I think should address is the first one, as dropping packets could lead to a whole world of pain debugging, particularly if one of the debug pages causes it.

@marklynch

Copy link
Copy Markdown
Owner

Have just seen #48 also, so if you want to stack any of the suggestions below on that can merge it all then.

@marklynch marklynch merged commit 954c68d into marklynch:main Jun 25, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

frame parser - implement a sliding window?

2 participants